Вам дана выгрузка событий из web-воронки со следующими типами событий:

  • onboarding_start - начало прохождения воронки
  • profile_start - пользователь начал заполнение анкеты
  • email_submit - пользователь ввел свой почтовый адрес
  • paywall_show - показ пэйвола
  • payment_done - пользователь совершил покупку

Посчитайте какая доля пользователей от изначального проходит до каждого из этапов воронки и какая доля теряется на каждом из этапов. Используя событие experiment_exposure, выберете 3 наиболее перспективных теста, свой выбор обоснуйте.

In [1]:
import pandas as pd
import plotly.express as px

from plotly.subplots import make_subplots
import plotly.graph_objects as go
import math


from statsmodels.stats.proportion import proportions_ztest

Data Uploading¶

In [2]:
df = pd.read_csv("simple_interview_events.csv")
df
Out[2]:
user_id event_type event_time event_params
0 32001 onboarding_start 2024-01-01T00:01:40 {"funnel_type": "female"}
1 99564 onboarding_start 2024-01-01T00:01:53 {"funnel_type": "male"}
2 32001 profile_start 2024-01-01T00:01:58 {}
3 99564 profile_start 2024-01-01T00:02:07 {}
4 71575 onboarding_start 2024-01-01T00:02:18 {"funnel_type": "female"}
... ... ... ... ...
346323 54690 paywall_show 2024-04-01T00:01:40 {}
346324 11354 paywall_show 2024-04-01T00:02:51 {}
346325 64903 paywall_show 2024-04-01T00:04:27 {}
346326 41417 paywall_show 2024-04-01T00:05:01 {}
346327 66780 paywall_show 2024-04-01T00:05:25 {}

346328 rows × 4 columns

Data Preprocessing¶

In [3]:
unique_values_costs = {col: df[col].unique() for col in df.columns}
unique_values_costs
Out[3]:
{'user_id': array([32001, 99564, 71575, ..., 64903, 41417, 66780], shape=(100000,)),
 'event_type': array(['onboarding_start', 'profile_start', 'email_submit',
        'paywall_show', 'payment_done', 'experiment_exposure'],
       dtype=object),
 'event_time': array(['2024-01-01T00:01:40', '2024-01-01T00:01:53',
        '2024-01-01T00:01:58', ..., '2024-04-01T00:04:27',
        '2024-04-01T00:05:01', '2024-04-01T00:05:25'],
       shape=(314666,), dtype=object),
 'event_params': array(['{"funnel_type": "female"}', '{"funnel_type": "male"}', '{}',
        '{"funnel_type": "main"}',
        '{"experiment_name": "exp_5", "experiment_group": "test"}',
        '{"experiment_name": "exp_5", "experiment_group": "control"}',
        '{"experiment_name": "exp_1", "experiment_group": "test"}',
        '{"experiment_name": "exp_1", "experiment_group": "control"}',
        '{"experiment_name": "exp_8", "experiment_group": "test"}',
        '{"experiment_name": "exp_8", "experiment_group": "control"}',
        '{"experiment_name": "exp_2", "experiment_group": "control"}',
        '{"experiment_name": "exp_2", "experiment_group": "test"}',
        '{"experiment_name": "exp_7", "experiment_group": "control"}',
        '{"experiment_name": "exp_7", "experiment_group": "test"}',
        '{"experiment_name": "exp_0", "experiment_group": "test"}',
        '{"experiment_name": "exp_0", "experiment_group": "control"}',
        '{"experiment_name": "exp_6", "experiment_group": "control"}',
        '{"experiment_name": "exp_6", "experiment_group": "test"}',
        '{"experiment_name": "exp_9", "experiment_group": "control"}',
        '{"experiment_name": "exp_9", "experiment_group": "test"}'],
       dtype=object)}

парсинг данных в event_params

In [4]:
df["funnel_type"] = df["event_params"].str.extract(r'"funnel_type":\s*"([^"]+)"')
df
Out[4]:
user_id event_type event_time event_params funnel_type
0 32001 onboarding_start 2024-01-01T00:01:40 {"funnel_type": "female"} female
1 99564 onboarding_start 2024-01-01T00:01:53 {"funnel_type": "male"} male
2 32001 profile_start 2024-01-01T00:01:58 {} NaN
3 99564 profile_start 2024-01-01T00:02:07 {} NaN
4 71575 onboarding_start 2024-01-01T00:02:18 {"funnel_type": "female"} female
... ... ... ... ... ...
346323 54690 paywall_show 2024-04-01T00:01:40 {} NaN
346324 11354 paywall_show 2024-04-01T00:02:51 {} NaN
346325 64903 paywall_show 2024-04-01T00:04:27 {} NaN
346326 41417 paywall_show 2024-04-01T00:05:01 {} NaN
346327 66780 paywall_show 2024-04-01T00:05:25 {} NaN

346328 rows × 5 columns

In [5]:
df["experiment_name"] = df["event_params"].str.extract(r'"experiment_name":\s*"([^"]+)"')
df
Out[5]:
user_id event_type event_time event_params funnel_type experiment_name
0 32001 onboarding_start 2024-01-01T00:01:40 {"funnel_type": "female"} female NaN
1 99564 onboarding_start 2024-01-01T00:01:53 {"funnel_type": "male"} male NaN
2 32001 profile_start 2024-01-01T00:01:58 {} NaN NaN
3 99564 profile_start 2024-01-01T00:02:07 {} NaN NaN
4 71575 onboarding_start 2024-01-01T00:02:18 {"funnel_type": "female"} female NaN
... ... ... ... ... ... ...
346323 54690 paywall_show 2024-04-01T00:01:40 {} NaN NaN
346324 11354 paywall_show 2024-04-01T00:02:51 {} NaN NaN
346325 64903 paywall_show 2024-04-01T00:04:27 {} NaN NaN
346326 41417 paywall_show 2024-04-01T00:05:01 {} NaN NaN
346327 66780 paywall_show 2024-04-01T00:05:25 {} NaN NaN

346328 rows × 6 columns

In [6]:
df["experiment_group"] = df["event_params"].str.extract(r'"experiment_group":\s*"([^"]+)"')
df
Out[6]:
user_id event_type event_time event_params funnel_type experiment_name experiment_group
0 32001 onboarding_start 2024-01-01T00:01:40 {"funnel_type": "female"} female NaN NaN
1 99564 onboarding_start 2024-01-01T00:01:53 {"funnel_type": "male"} male NaN NaN
2 32001 profile_start 2024-01-01T00:01:58 {} NaN NaN NaN
3 99564 profile_start 2024-01-01T00:02:07 {} NaN NaN NaN
4 71575 onboarding_start 2024-01-01T00:02:18 {"funnel_type": "female"} female NaN NaN
... ... ... ... ... ... ... ...
346323 54690 paywall_show 2024-04-01T00:01:40 {} NaN NaN NaN
346324 11354 paywall_show 2024-04-01T00:02:51 {} NaN NaN NaN
346325 64903 paywall_show 2024-04-01T00:04:27 {} NaN NaN NaN
346326 41417 paywall_show 2024-04-01T00:05:01 {} NaN NaN NaN
346327 66780 paywall_show 2024-04-01T00:05:25 {} NaN NaN NaN

346328 rows × 7 columns

In [7]:
df = df.drop(columns=["event_params"])
df
Out[7]:
user_id event_type event_time funnel_type experiment_name experiment_group
0 32001 onboarding_start 2024-01-01T00:01:40 female NaN NaN
1 99564 onboarding_start 2024-01-01T00:01:53 male NaN NaN
2 32001 profile_start 2024-01-01T00:01:58 NaN NaN NaN
3 99564 profile_start 2024-01-01T00:02:07 NaN NaN NaN
4 71575 onboarding_start 2024-01-01T00:02:18 female NaN NaN
... ... ... ... ... ... ...
346323 54690 paywall_show 2024-04-01T00:01:40 NaN NaN NaN
346324 11354 paywall_show 2024-04-01T00:02:51 NaN NaN NaN
346325 64903 paywall_show 2024-04-01T00:04:27 NaN NaN NaN
346326 41417 paywall_show 2024-04-01T00:05:01 NaN NaN NaN
346327 66780 paywall_show 2024-04-01T00:05:25 NaN NaN NaN

346328 rows × 6 columns

тестовый юзер

In [8]:
df[df["user_id"] == 91875]
Out[8]:
user_id event_type event_time funnel_type experiment_name experiment_group
34702 91875 experiment_exposure 2024-01-10T07:21:02 NaN exp_8 test
34703 91875 onboarding_start 2024-01-10T07:21:02 male NaN NaN
34705 91875 profile_start 2024-01-10T07:21:16 NaN NaN NaN
34708 91875 email_submit 2024-01-10T07:21:36 NaN NaN NaN
34717 91875 paywall_show 2024-01-10T07:28:28 NaN NaN NaN

заполняем NaN известными значениями

In [9]:
df = df.sort_values(["user_id", "event_time"]) 
df
Out[9]:
user_id event_type event_time funnel_type experiment_name experiment_group
193355 0 onboarding_start 2024-02-20T16:21:51 male NaN NaN
193356 0 profile_start 2024-02-20T16:22:05 NaN NaN NaN
193357 0 email_submit 2024-02-20T16:22:25 NaN NaN NaN
193375 0 paywall_show 2024-02-20T16:29:17 NaN NaN NaN
317565 1 onboarding_start 2024-03-24T07:48:11 main NaN NaN
... ... ... ... ... ... ...
86357 99998 paywall_show 2024-01-23T07:49:26 NaN NaN NaN
300974 99999 onboarding_start 2024-03-20T00:41:12 female NaN NaN
300975 99999 profile_start 2024-03-20T00:41:30 NaN NaN NaN
300976 99999 email_submit 2024-03-20T00:41:55 NaN NaN NaN
300999 99999 paywall_show 2024-03-20T00:49:25 NaN NaN NaN

346328 rows × 6 columns

In [10]:
df[['experiment_name', 'experiment_group', 'funnel_type']] = (
    df.groupby('user_id')[['experiment_name', 'experiment_group', 'funnel_type']]
    .transform(lambda x: x.ffill().bfill())
)
/var/folders/cj/5yw5vpbx3w793bdrt43m_7zm0000gn/T/ipykernel_76229/182570417.py:3: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  .transform(lambda x: x.ffill().bfill())

проверка

In [11]:
df[df["user_id"] == 91875]
Out[11]:
user_id event_type event_time funnel_type experiment_name experiment_group
34702 91875 experiment_exposure 2024-01-10T07:21:02 male exp_8 test
34703 91875 onboarding_start 2024-01-10T07:21:02 male exp_8 test
34705 91875 profile_start 2024-01-10T07:21:16 male exp_8 test
34708 91875 email_submit 2024-01-10T07:21:36 male exp_8 test
34717 91875 paywall_show 2024-01-10T07:28:28 male exp_8 test

удаление строк с funnel_type из-за парсинга experiment_name experiment_group

смотрим есть ли такие пользователи, что у них единственная строка - это строка funnel_type = NaN, чтобы знать можно ли удалить

In [12]:
users_with_nan = df[df["funnel_type"].isna()]["user_id"]
user_event_counts = df["user_id"].value_counts()
# пользователи, у которых только одна строка в датасете
single_event_users = user_event_counts[user_event_counts == 1].index

# пересекаем с теми, у кого эта строка — с NaN funnel_type
users_with_only_nan_row = users_with_nan[users_with_nan.isin(single_event_users)].unique()
print(f"Таких пользователей: {len(users_with_only_nan_row)}")
print(users_with_only_nan_row)
Таких пользователей: 0
[]

нет таких, удаляем

In [13]:
df = df[df['event_type'] != 'experiment_exposure'].fillna(0)
df
Out[13]:
user_id event_type event_time funnel_type experiment_name experiment_group
193355 0 onboarding_start 2024-02-20T16:21:51 male 0 0
193356 0 profile_start 2024-02-20T16:22:05 male 0 0
193357 0 email_submit 2024-02-20T16:22:25 male 0 0
193375 0 paywall_show 2024-02-20T16:29:17 male 0 0
317565 1 onboarding_start 2024-03-24T07:48:11 main 0 0
... ... ... ... ... ... ...
86357 99998 paywall_show 2024-01-23T07:49:26 female exp_8 test
300974 99999 onboarding_start 2024-03-20T00:41:12 female 0 0
300975 99999 profile_start 2024-03-20T00:41:30 female 0 0
300976 99999 email_submit 2024-03-20T00:41:55 female 0 0
300999 99999 paywall_show 2024-03-20T00:49:25 female 0 0

321186 rows × 6 columns

In [14]:
df[df["user_id"] == 91875]
Out[14]:
user_id event_type event_time funnel_type experiment_name experiment_group
34703 91875 onboarding_start 2024-01-10T07:21:02 male exp_8 test
34705 91875 profile_start 2024-01-10T07:21:16 male exp_8 test
34708 91875 email_submit 2024-01-10T07:21:36 male exp_8 test
34717 91875 paywall_show 2024-01-10T07:28:28 male exp_8 test

Funnel¶

Вам дана выгрузка событий из web-воронки со следующими типами событий:

  • onboarding_start - начало прохождения воронки
  • profile_start - пользователь начал заполнение анкеты
  • email_submit - пользователь ввел свой почтовый адрес
  • paywall_show - показ пэйвола
  • payment_done - пользователь совершил покупку

Посчитайте какая доля пользователей от изначального проходит до каждого из этапов воронки и какая доля теряется на каждом из этапов.

подготовка¶

In [15]:
funnel_order = [
    'onboarding_start',
    'profile_start',
    'email_submit',
    'paywall_show',
    'payment_done'
]
In [16]:
def calculate_funnel_metrics(df, funnel_order, group_by_list):
    # считаем пользователей
    users_by_step = df.groupby(group_by_list)["user_id"].nunique().unstack(fill_value=0)

    # Упорядочек
    users_by_step = users_by_step[funnel_order]

    # Конверсии
    conv_from_start = users_by_step.div(users_by_step[funnel_order[0]], axis=0) * 100

    # Потери
    drop_offs = users_by_step.pct_change(axis=1).fillna(1) - 1
    drop_offs = -drop_offs * 100

    conv_from_start = conv_from_start.round(3)
    drop_offs = drop_offs.round(3)

    # Объединение
    result = []
    for funnel in users_by_step.index:
        for stage in funnel_order:
            result.append({
                'group_by': funnel,
                'event_type': stage,
                'users': users_by_step.loc[funnel, stage],
                'conversion_from_start': conv_from_start.loc[funnel, stage],
                'drop_off_from_previous': drop_offs.loc[funnel, stage]
            })

    return pd.DataFrame(result)

по funnel_type¶

In [17]:
by_event_type = calculate_funnel_metrics(df, funnel_order, group_by_list = ["funnel_type", "event_type"])
by_event_type
Out[17]:
group_by event_type users conversion_from_start drop_off_from_previous
0 female onboarding_start 54987 100.000 -0.000
1 female profile_start 46662 84.860 115.140
2 female email_submit 41967 76.322 110.062
3 female paywall_show 39555 71.935 105.747
4 female payment_done 2887 5.250 192.701
5 main onboarding_start 10027 100.000 -0.000
6 main profile_start 8326 83.036 116.964
7 main email_submit 575 5.735 193.094
8 main paywall_show 7897 78.757 -1173.391
9 main payment_done 683 6.812 191.351
10 male onboarding_start 34986 100.000 -0.000
11 male profile_start 28063 80.212 119.788
12 male email_submit 22685 64.840 119.164
13 male paywall_show 20026 57.240 111.721
14 male payment_done 1860 5.316 190.712
In [18]:
for event_type in by_event_type['group_by'].unique():
    current_df = by_event_type[by_event_type['group_by'] == event_type]
    data = dict(
        number=current_df['users'].tolist(),
        stage=current_df['event_type'].tolist(),
        text=[f"{int(x)}%" for x in current_df['conversion_from_start']]
    )

    fig = px.funnel(data, x='number', y='stage', text='text')
    fig.update_traces(textposition='inside')
    fig.update_layout(title=f"Общая воронка регистрации для funnel_type = {event_type}",
            width=800,
            height=500)
    fig.show()

Нет кучи событий по email_submit, считая что все этапы для воронки важны, можно предположить, что там данные пропали/нет информации

Около 10 процентов пользователей с таким встретились, поэтому можно их удалить ИЛИ оставить только тех, про кого есть информация, но тогда будет 100% проходимость

поэтому удаляем

In [19]:
df = df[df['funnel_type'] != 'main']
df
Out[19]:
user_id event_type event_time funnel_type experiment_name experiment_group
193355 0 onboarding_start 2024-02-20T16:21:51 male 0 0
193356 0 profile_start 2024-02-20T16:22:05 male 0 0
193357 0 email_submit 2024-02-20T16:22:25 male 0 0
193375 0 paywall_show 2024-02-20T16:29:17 male 0 0
5818 2 onboarding_start 2024-01-02T16:12:34 female 0 0
... ... ... ... ... ... ...
86357 99998 paywall_show 2024-01-23T07:49:26 female exp_8 test
300974 99999 onboarding_start 2024-03-20T00:41:12 female 0 0
300975 99999 profile_start 2024-03-20T00:41:30 female 0 0
300976 99999 email_submit 2024-03-20T00:41:55 female 0 0
300999 99999 paywall_show 2024-03-20T00:49:25 female 0 0

293678 rows × 6 columns

по experiment_name¶

In [20]:
by_experiment = calculate_funnel_metrics(df[df["experiment_name"] != 0], funnel_order, group_by_list = ["experiment_name", "experiment_group", "event_type"])
by_experiment[['experiment_name', 'group']] = by_experiment['group_by'].apply(pd.Series)
by_experiment
Out[20]:
group_by event_type users conversion_from_start drop_off_from_previous experiment_name group
0 (exp_0, control) onboarding_start 1371 100.000 -0.000 exp_0 control
1 (exp_0, control) profile_start 1146 83.589 116.411 exp_0 control
2 (exp_0, control) email_submit 994 72.502 113.264 exp_0 control
3 (exp_0, control) paywall_show 911 66.448 108.350 exp_0 control
4 (exp_0, control) payment_done 83 6.054 190.889 exp_0 control
... ... ... ... ... ... ... ...
75 (exp_9, test) onboarding_start 1387 100.000 -0.000 exp_9 test
76 (exp_9, test) profile_start 1218 87.815 112.185 exp_9 test
77 (exp_9, test) email_submit 1034 74.549 115.107 exp_9 test
78 (exp_9, test) paywall_show 956 68.926 107.544 exp_9 test
79 (exp_9, test) payment_done 74 5.335 192.259 exp_9 test

80 rows × 7 columns

пс: на графики можно навести и там будут цифры

In [21]:
for exp in by_experiment['experiment_name'].unique():
    current_df = by_experiment[by_experiment['experiment_name'] == exp]

    fig = px.funnel(
        current_df,
        x='users',
        y='event_type',
        color='group',  # control / test
        text=current_df['conversion_from_start'].apply(lambda x: f"{x:.0f}%")
    )

    fig.update_traces(textposition='inside')
    fig.update_layout(
        title=f"Воронка регистрации для {exp}",
        xaxis_title="Конверсия от старта (%)",
        yaxis_title="Этап",
            width=800,
            height=500
    )

    fig.show()

по experiment_name и funnel_type¶

In [22]:
by_everything = calculate_funnel_metrics(df[df["experiment_name"] != 0], funnel_order, group_by_list = ["funnel_type", "experiment_name", "experiment_group", "event_type"])
by_everything[['funnel_type', 'experiment_name', 'group']] = by_everything['group_by'].apply(pd.Series)
by_everything
Out[22]:
group_by event_type users conversion_from_start drop_off_from_previous funnel_type experiment_name group
0 (female, exp_0, control) onboarding_start 832 100.000 -0.000 female exp_0 control
1 (female, exp_0, control) profile_start 710 85.337 114.663 female exp_0 control
2 (female, exp_0, control) email_submit 635 76.322 110.563 female exp_0 control
3 (female, exp_0, control) paywall_show 602 72.356 105.197 female exp_0 control
4 (female, exp_0, control) payment_done 54 6.490 191.030 female exp_0 control
... ... ... ... ... ... ... ... ...
155 (male, exp_9, test) onboarding_start 546 100.000 -0.000 male exp_9 test
156 (male, exp_9, test) profile_start 461 84.432 115.568 male exp_9 test
157 (male, exp_9, test) email_submit 361 66.117 121.692 male exp_9 test
158 (male, exp_9, test) paywall_show 312 57.143 113.573 male exp_9 test
159 (male, exp_9, test) payment_done 21 3.846 193.269 male exp_9 test

160 rows × 8 columns

In [23]:
for exp in by_everything['experiment_name'].unique():
    df_exp = by_everything[by_everything['experiment_name'] == exp]

    for funnel_type in df_exp['funnel_type'].unique():
        df_sub = df_exp[df_exp['funnel_type'] == funnel_type]

        fig = px.funnel(
            df_sub,
            x='users',
            y='event_type',
            color='group',  # control / test
            text=df_sub['conversion_from_start'].apply(lambda x: f"{x:.0f}%")
        )

        fig.update_traces(textposition='inside')
        fig.update_layout(
            title=f"Воронка для {exp}, funnel_type = {funnel_type}",
            xaxis_title="Конверсия от старта (%)",
            yaxis_title="Этап",
            width=800,
            height=500
        )

        fig.show()

сравниваем¶

In [24]:
df_cr = by_experiment[by_experiment['event_type'] == 'payment_done']
In [25]:
results_ztest = []

for exp in df_cr['experiment_name'].unique():
    df_exp = df_cr[df_cr['experiment_name'] == exp]
    
    if {'control', 'test'}.issubset(df_exp['group'].unique()):
        control_row = df_exp[df_exp['group'] == 'control'].iloc[0]
        test_row = df_exp[df_exp['group'] == 'test'].iloc[0]

        control_n = int(control_row['users'])
        test_n = int(test_row['users'])

        control_conv = control_row['conversion_from_start']
        test_conv = test_row['conversion_from_start']

        uplift = (test_conv - control_conv) / control_conv * 100

        # success (платежи)
        count = [
            control_conv,
            test_conv
        ]
        nobs = [test_n, control_n]

        stat, p_value = proportions_ztest(count, nobs)

        results_ztest.append({
            'experiment_name': exp,
            'control_conv': round(control_conv, 2),
            'test_conv': round(test_conv, 2),
            'uplift_percent': round(uplift, 2),
            'p_value': round(p_value, 4)
        })

results_ztest_df = pd.DataFrame(results_ztest).sort_values(by='uplift_percent', ascending=True).reset_index(drop=True)
results_ztest_df
Out[25]:
experiment_name control_conv test_conv uplift_percent p_value
0 exp_8 6.13 5.15 -15.95 0.5501
1 exp_0 6.05 5.40 -10.72 0.6169
2 exp_1 6.99 6.84 -2.25 0.9416
3 exp_6 7.54 7.94 5.27 0.8284
4 exp_5 7.52 7.98 6.10 0.7188
5 exp_7 7.48 7.99 6.72 0.7620
6 exp_9 4.71 5.34 13.37 0.7300
7 exp_2 7.56 10.21 35.04 0.1905
In [26]:
df_cr = by_everything[by_everything['event_type'] == 'payment_done']
In [27]:
results_ztest = []

# Группируем по experiment_name и funnel_type
for (exp, funnel) in df_cr[['experiment_name', 'funnel_type']].drop_duplicates().itertuples(index=False):
    df_exp = df_cr[(df_cr['experiment_name'] == exp) & (df_cr['funnel_type'] == funnel)]

    if {'control', 'test'}.issubset(df_exp['group'].unique()):
        control_row = df_exp[df_exp['group'] == 'control'].iloc[0]
        test_row = df_exp[df_exp['group'] == 'test'].iloc[0]

        control_n = int(control_row['users'])
        test_n = int(test_row['users'])

        control_conv = control_row['conversion_from_start']
        test_conv = test_row['conversion_from_start']

        count = [
            control_conv,
            test_conv
        ]
        nobs = [test_n, control_n]

        stat, pval = proportions_ztest(count, nobs)
        uplift = ((test_conv - control_conv) / control_conv) * 100

        results_ztest.append({
            'experiment_name': exp,
            'funnel_type': funnel,
            'control_conv': round(control_conv, 2),
            'test_conv': round(test_conv, 2),
            'uplift_percent': round(uplift, 2),
            'p_value': round(pval, 4)
        })

results_ztest_df = pd.DataFrame(results_ztest).sort_values(by='p_value', ascending=True).reset_index(drop=True)
results_ztest_df
Out[27]:
experiment_name funnel_type control_conv test_conv uplift_percent p_value
0 exp_7 male 7.33 11.52 57.25 0.0061
1 exp_6 male 6.95 9.60 38.13 0.1096
2 exp_2 female 6.49 9.21 41.84 0.1702
3 exp_8 male 6.51 4.49 -30.96 0.1707
4 exp_9 female 4.23 6.30 48.91 0.2061
5 exp_2 male 9.70 12.01 23.83 0.2121
6 exp_9 male 5.48 3.85 -29.87 0.2268
7 exp_7 female 7.57 6.00 -20.66 0.3227
8 exp_5 female 5.34 6.72 25.67 0.3341
9 exp_0 female 6.49 5.32 -18.00 0.4333
10 exp_1 male 6.83 8.15 19.30 0.5253
11 exp_5 male 11.98 10.38 -13.37 0.5314
12 exp_1 female 7.08 6.17 -12.90 0.6507
13 exp_6 female 7.85 7.12 -9.33 0.7004
14 exp_8 female 5.89 5.56 -5.70 0.8599
15 exp_0 male 5.38 5.54 3.05 0.9392

Выводы¶

  • exp_2 - самый переспективный overall: максимально низкое p-value среди всех, uplift_percent наивысши. Эффект размыт за счёт смешивания male/female в одну метрику, но и по отдельности они ведут себя хорошо, в обеих группах результат положительный (аплифт +23.83% и +41.84%)

  • exp_7 male - самый переспективный для раскатки, НО только на мужскую аудиторию приложения: высокий рост конверсии при высокой достоверности результата, Uplift: +57.25%, для женской аудитории спад -20.66%, но overall результат положительный 6.72%

  • exp_9 несмотря на то, что в exp_6 male лучше (а для женщин ниже), overall результат на две аудитории хуже, чем у exp_9 (тут наоборот у женщин лучше) учитвая то, что p-value exp_9 < p_value exp_6 и приложение расчитано больше на женскую аудиторию, то этот тест становится наилучшим выбором из двоих

Спасибо за внимание